18.05 Lecture 33 
May 4, 2005 



Simple goodness-of-fit test: 




Decision Rule: 



5 = {Hi:T<c;H2:T>c} 

If the distribution is continuous or has infinitely many discrete points: 
Hypotheses: Hi : ¥ = Wq; H2 : ¥ Fq 




Discretize the distribution into intervals, and count the points in each interval. 

You know the probability of each interval by area, then, consider a finite number of intervals. 

This discretizes the problem. 

New Hypotheses: H[ : = ¥{X e h) = ¥o{X € Ii);H2 otherwise. 
If Hi is true H[ is also true. 

Rule of Thumb: 

npO = nPo(X e /«) > 5 

If too small, too unlikely to find points in the interval, 
does not approximate the chi-square distribution well. 

Example 9.1.2 Data ~ 7V(3.912, 0.25), n = 23 
Hi :P-A^(3. 912, 0.25) 
Choose k intervals ^ = ^ 
n(i)>5^f >5,fc = 4 
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(3.912, 0.25) ~ X ^ 
Dividing points: ci,C2 



Ji:-3.912 



7V(0,1) 



V0.25 

3.912, C3 

Find the normalized dividing points by the foUowing relation: 

C: - 3.912 



0.5 




— 1 — 



The c'i values are from the std. normal distribution. 
^ c'l = -0.68 ^ ci = -0.68(0.5) + 3.912 = 3.575 
^ c'2 = C2 = 0(0.5) + 3.912 = 3.912 
^ c'3 = 0.68 ^ C3 = 0.68(0.5) + 3.912 = 4.249 

Then, count the number of data points in each interval. 
Data: A^i = 3, = 4, A's = 8, = 8; n = 23 
Calculate the T statistic: 



T = 

Now, decide if T is too large. 
a = 0.05 - significance level. 
X?-i^xi,c= 7.815 



(3 - 23(0.25))^ 
23(0.25 



+ ...+ 



-23(0.5))' 
23(0.25) 



= 3.609 
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Decision Rule: 

6 = {Hi:T< 7.815; H2 : T > 7.815} 

T = 3.609 < 7.815, conclusion: accept Hi 

The distribution is relatively uniform among the intervals. 

Composite Hypotheses: 

Hi : Pi = pi{9),i < r for 6 G Q - parameter set. 
H2 : not true for any choice of 6 



Step 1: Find 9 that best describes the data. 
Find the MLE of (9 

Likelihood Function: iP{0) = pi{e)^'p2{e)^~^ x ... x Pr{e)^- 
Take the log of V'(^) ^ maximize — » 9 

Step 2: See if the best choice of 9 is good enough. 
Hi : Pi = Pi{9) for i < r,H2 : otherwise. 



2 

r—s — l 



where s - dimension of the parameter set, number of free parameters. 
Example: N{^, cr^) s = 2 

If there are a lot of free parameters, it makes the distribution set more flexible. 
Need to subtract out this flexibility by lowering the degrees of freedom. 



Decision Rule: 

6 = {Hi:T<c;H2:T>c} 
Choose c from Xr-s-i with area = a 









^^^^^^^ 











Example: (pg. 543) 
Gene has 2 possible alleles Ai , A2 
Genotypes: AiAi, A1A2, A2A2 
Test that P(Ai) = 9,¥{A2) = 1-9, 
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but you only observe genotype. 



Hi : F{AiA2) = 20(1 - 9) ^ N2 
F{AiAi) = 6*2 ^ iVi 
¥{A2A2) = (1 - 0)2- ^ Ns 
r = 3 categories, 
s = 1 (only 1 parameter, 9) 

il,[9) = (6*2)^1 (26'(1 - 9))^^{{1 - 6')2)^3 = 2^^6»2^i+^2(l - 9) 

log V(^) = iV2 log 2 + (2iVi + log 9 + {2N3 + N2) log(l - 

d_ _ 2Ni + N2 _ 2N3 +N2 _ 
89 ~ 9 1-9 ~ 

{2Ni + N2){1 -9)- {2N3 + N2)9 = 

2-^ 2jVi + N2 _ 2Ni + N2 
~ 2Ni + 2N2 + 2Nz ~ 2n 

compute 9 based on data. 

= 0^,pO = 20(1 -0),pO = (1-0)2 




For an a = 0.05, c = 3.841 from the Xi distribution. 
Decision Rule: 

S = {Hi:T < 3.841; H2 : T > 3.841} 
** End of Lecture 33 
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